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SUMMARY    AND    CONCLUSIONS 

Experiments  were  conducted  on  three  problems  having  general  application  to 
Babcock  butterfat  test  results,  on  one  problem  affecting  only  tests  on  milk  received 
in  cans,  and  on  four  problems  related  to  milk  picked  up  in  bulk  tanksD  The  results 
provide  a  basis  for  increased  confidence  in  methods  of  testing  and  of  verifying  tests 
of  milk  delivered  to  plants  by  milk  producers.  Highlights  of  the  results  are  as 
follows: 

lo  Variations  among  testers  in  measuring  and  reading  the  Babcock  test  fre  = 
quently  result  in  test  differences  of  one  point  on  identical  samples  by  two  experi- 
enced testers,  but  seldom  result  in  differences  greater  than  one  pointo  Indications 
are  that  over  98  percent  of  the  tests  on  the  same  samples  by  pairs  of  testers  can  be 
expected  to  be  within  one  point  of  each  other,,  Such  agreement  is  found  among  testers 
who  regularly  test  large  numbers  of  samples  and  among  testers  who  frequently 
compare  results  and  methodology  or  have  the  same  supervision,,  Technicians  who 
test  only  occasionally  or  who  work  under  different  supervision  can  be  expected  to 
have  test  results  that  do  not  agree  this  closely. 

The     estimated     part  of     the    test    variance    which    could  be  eliminated  by  using 

only    one    pipetter     ranged  from    0    to     29   percent,  and  the  part  of  the  variance  which 

could  be  expected  to  be  eliminated  by  using  only  one  reader  ranged  from  0  to  41 
percento 

2„  In  only  one  of  five  markets  where  the  relation  between  pipetting  temperatures 
and  test  results  was  studied,  was  there  a  consistent  and  significant  relationship 
between  pipetting  temperatures  and  test  results,  with  higher  temperature  yielding 
lower  testSo 

3o  The    number    of    times    a    composite     sample  was  reheated  was  a  more  im- 

portant factor  affecting  level  of  test  than  the  length  of  time  the  composite  was  stored. 
The  samples  differing  the  most  from  the  original  composite  tests  were  those  held  5 
days  and  reheated  three  times0  On  the  average,  they  tested  0,13  lower  in  percent  of 
butterfat  than  the  corresponding  original  composite  samples, 

40  In  plants  receiving  milk  in  cans,  when  some  method  of  agitation  was  used 
to  improve  the  mixing  of  the  milk  in  the  weigh  tanks  oefore  sampling,  from  0  to  36 
percent  of  the  samples  differed  in  test  by  more  than  0,1  in  percent  (or  1  point)  of 
butterfat0  Even  among  tanks  with  the  same  method  of  agitation,  there  was  consid= 
erable  variation  in  the  percentages  of  paired  samples  differing  by  more  than  1 
pointo  In  5  plants  where  weigh  tanks  were  sampled  for  experimental  purposes, 
without  previously  agitating  the  milk,  from  10  to  50  percent  of  the  paired  samples 
differed  by  more  than  1  point.  Where  the  samples  were  taken  following  some  form 
of  agitation,  none  of  the  average  differences  differed  significantly  from  zero, 

5,  Differences     between    the    weighted    averages    of    producers'  tests  and  the 

test  of  milk  taken  from  loaded  tank  trucks  of  milk  varied  considerably  according 
to  the  method  of  sampling  from  the  tank  truck.  The  experiments  indicated  that 
representative  samples  can  be  obtained  from  loaded  tank  trucks  only  if  the  milk 
has  been  agitated  before  sampling.  In  the  experiments  where  some  form  of  agi- 
tation was  used  before  sampling  the  tank,  97  percent  or  more  of  the  samples  from 
the  tank  truck  tested  within  0,04  percent  of  butterfat  of  the  weighted  average  of  the 
tests  on  the  individual  herd  milks  commingled  in  the  tank  truck, 

6,        The    use    of  four   drops    of  preservative  in  the  sample  bottle  before  pipetting 
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was  found  to  have  no  significant  effect  on  the  tests  of  fresh  samples, 

70  Test  results  on  composite  samples  built  and  pipetted  at  the  farm  were  com- 
pared with  tests  on  composites  built  and  pipetted  at  the  laboratory,  and  tests  on  fresh 
samples  pipetted  at  the  farm  were  compared  with  tests  on  fresh  samples  pipetted  at 
the  laboratory,,  Results  on  samples  pipetted  at  the  farm  did  not  differ  significantly 
from  tests  of  samples  pipetted  at  the  laboratory,,  It  was  concluded,  therefore,  that 
transportation  of  samples  on  the  tank  truck  did  not  significantly  affect  the  samples, 
Usual  precautions  taken  to  preserve  samples  of  bulk  milk  between  sampling  at  the 
farm  and  testing  at  the  laboratory  appear  to  be  adequate,,  Exceptions  can  occur, 
however,  through  carelessness  of  individual  haulers,  particularly  on  warm  days0 
The  effect  of  such  mistreatment  of  samples  was  not  determined  in  this  study0 

80  When,  for  experimental  purposes,  some  defective  daily  portions,  such  as 
frozen  or  churned,  were  included  in  the  composite  sample,  the  spread  between  the 
tests  of  the  composite  and  the  averages  of  the  tests  of  the  daily  fresh  samples  was 
greater  in  a  significant  number  of  cases  than  when  all  daily  portions  in  the  composite 
were  normal, 


SELECTED  PROBLEMS  IN  BUTTERFAT  SAMPLING  AND  TESTING 


By  Anthony  G.  Mathis,  Robert  W0  Johnson,  and 

Elsie  Do  Anderson  \J 

Marketing  Economics  Division 

Economic  Research  Service 


INTRODUCTION 

This  is  a  report  of  eight  studies  of  techniques  in  sampling  milk  and  testing  it  for 
butt  erf  at  content,, 

Most  of  the  methods  studied  have  been  suggested  by  Federal  order  market 
administrators  or  others  as  subjects  needing  research,  because  there  was  insuffi- 
cient reliable  knowledge  as  to  whether  they  cause  significant  difference  in  test 
results. 

If  duplicate  series  of  samples  are  tested  by  techniques  differing  even  slightly, 
the  average  percentages  of  butterfat  and  the  variations  around  these  averages  may 
be  different. 

To  lessen  the  impact  of  bias  and  test  variability  on  producers'  returns  for  milk, 
Federal  market  administrators,  producers'  organizations,  and  some  State  agencies 
check  the  butterfat  tests  performed  by  plants.  Market  administrators  also  test 
fluid  milk  products  of  each  plant  to  verify  the  plant's  report  of  milk  usage.  Plants 
buying     milk     from     other    plants     check    the     seller's     statement  of  butterfat  content. 

As  an  administrative  necessity,  an  official  test  is  regarded  as  correct,  and 
undue  deviation  of  the  plant  tests  from  this  norm  is  subject  to  correction.  In  the 
case  of  two  plants,  one  buying,  the  other  selling  milk,  differences  in  the  plants' 
test  results  are  a  subject  for  negotiation  before  settling  for  the  milk.  As  a  prac- 
tical matter,  the  comparison  between  the  tests  must  be  defined  in  terms  of  a  range 
about  the  check-test  within  which  a  plant's  test  is  acceptable.  This  range  of  accept- 
ability is  necessary  in  part  because  the  Babcock  test  which  is  the  accepted  test 
for  butterfat  in  the  United  States,  is  accurate  only  within  bounds.  One  of  the  factors 
limiting  the  accuracy  of  the  Babcock  test  is  inaccuracy  in  the  calibration  of  glassware 
used  in  the  test  (4,  5)  .  2/  The  Association  of  Official  Agricultural  Chemists  makes 
very  specific  recommendations  as  to  the  type  of  glassware  that  should  be  used. 
However,  a  number  of  States  have  no  specifications  for  glassware  (_9).  3/ 

Variations  in  test  results,  beyond  those  inherent  in  the  Babcock  test,  also  may 
be  caused  by  a  number  of  factors,  such  as  slight  differences  between  individuals  in 
performing    testing     routines,     difference     in    the    representativeness    of    samples,  and 

f7~"Freci  St ein~ formerly  with  Marketing  Economics  Division,  and  now  with  the  Dairy 
Division,  Agricultural  Marketing  Service,  had  an  active  part  in  planning  this  work  and 
made  arrangements  with  market  administrators  and  other  sources  of  original  material 
for  the  collection  of  data  for  this  report, 

2/     Underscored  figures  in  parentheses  refer  to  items  in  Literature  Cited,  page  33» 
3/        In   order   to    minimize    differences    caused   by  glassware,  all  glassware  used  in 
the"  present    study    was  officially  calibrated  by  State  Agricultural  Colleges  or  qualified 
laboratories. 
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slight  differences  in  procedures.  Variations  of  this  kind  can  be  controlled,  and  to 
this  end  the  present  study  is  addressed,,  Specifically,  the  task  of  this  present  study 
is: 

lo  To  determine  how  and  to  what  extent  the  use  of  different  techniques  in  sampling 
and  testing  may  affect  test  results,, 

2.  To  indicate  biases  and  variations  in  test  results  along  with  probability  that 
variations  of  a  given  size  will  occur  when  a  given  technique  is  usedo 

These  objectives  should  afford  persons  or  organizations  interested  in  butter- 
fat  tests  a  broader  base  of  empirical  knowledge  for  deciding  the  acceptability  of  various 
procedures  and  for  establishing  limits  within  which  tests  on  the  same  sets  of  samples 
can  be  expected  to  agree,, 

Most  of  the  studies  included  in  this  report  involved  closely  controlled  procedures. 
Comparisons,  therefore,  usually  were  made  on  a  limited  number  of  samples.  The 
studies  have  been  grouped  into  general  problems  in  butterfat  sampling  and  testing, 
a  sampling  problem  for  plants  receiving  milk  in  cans,  and  problems  related  to  milk 
picked  up  in  bulk  tank  trucks,,  In  general,  analysis  of  variance  was  used  for  statis- 
tical comparisons  of  results  from  different  methods  of  sampling  or  testing.  The 
number  of  observations,  plants,  and  markets  differed  from  one  study  to  another.  Also, 
the  statistical  methods  used  in  the  analysis  differed  among  studies.  The  methods 
followed  will  be  identified  in  the  discussion  of  each  study.  Generally,  the  findings  are 
expressed  by  giving  the  proportions  of  test  results  that  can  be  expected  to  agree  or 
differ  by  stated  amounts. 

These  studies  were  concurrent  with  a  larger  research  project  in  which  about 
230,000  milk  samples  were  tested  from  deliveries  of  1,700  producers  for  an  average 
of  5,5  months  to  21  * 'plants"  in  9  markets,  4/  The  principal  objective  of  the  larger 
project  was  to  determine  how  much  variability  in  producers  tests  can  be  expected 
under  certain  environmental  conditions,  whether  this  expected  variability  is  constant 
from  season  to  season  and  market  to  market,  and  how  butterfat  sampling  and  testing 
programs  can  be  organized  to  take  into  account  this  normal  variability. 

PROBLEMS   WITH   APPLICATION  FOR  ALL  BUTTERFAT  SAMPLING  AND  TESTING 

Three  of  the  problems  considered  in  this  study  have  general  application  to  Babcock 
test  results.  These  problems  pertain  to  the  actual  testing  of  milk  in  the  laboratory: 
differences  among  testers  and  departures  from  standard  methods,  specifically  pipetting 
temperatures  and  techniques  affecting  tests  on  composite  samples. 

Differences  Among  Testers 

The  procedures  involved  in  the  Babcock  test  are  carefully  defined  to  limit  the 
possibilities  for  differences  in  test  results  on  the  same  sample  among  different 
testers  or  by  the  same  tester  in  repeated  testing.  However,  there  remain  possibilities 
for  differences  in  tests  because  of  varying  personal  abilities  of  the  individual  testers 
to  make  measurements  and  read  tests  uniformly. 


4/      The    can  and  bulk  operations  in  each  of  2  plants  were  considered  to  be  separate 
operations,  so  that  the  21   "plants"  represented  19  establishments. 
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Results  of  experiments  carried  out  by  Herreid  and  others  indicate  that  differences 
in  test  results  can  be  lessened,  in  many  cases,  by  more  careful  supervision  and 
attention  to  techniques  (5,  6^,  _7,  8)  0  Herreid  points  out  that  many  testers  fill  their 
pipettes  to  the  point  where  the  lowest  part  of  the  meniscus  is  level  with  the  mark, 
although  Babcock  considered  that  the  pipette  was  full  when  the  milk  touched  the 
mark,  (5), 

A  number  of  studies  show  that  the  use  of  glymol  to  eliminate  the  meniscus  would 
lessen  variability  in  reading  tests,,  Herreid  found  that  increasing  the  size  of  sample 
from  18  to  18,36  grams  as  well  as  using  glymol  would  bring  the  Babcock  method  into 
closer  agreement  with  the  Rose-Gottlieb  method,,  Lampert,  Nelson,  and  Wilster  (13) 
and    Herreid     (5)      suggest    the    use    of    reading  devices  that  would  improve  accuracy. 

A  recent  study  involving  tests  of  duplicate  samples  by  six  technicians  in  different 
laboratories  affords  some  measure  of  the  ability  of  testers  to  reproduce  Babcock  test 
results  (11)  o  This  study,  in  which  the  tests  were  read  to  the  nearest  one  =  hundredth 
of  one  percent,  showed  that  in  two-thirds  of  replicated  tests,  the  same  tester  would 
get  results  from  one  test  within  0,046  percent  butterfat  of  another  test  on  the  same 
sample.  From  this  figure  one  can  deduce  that  95  percent  of  paired  comparisons  of 
tests  which  have  been  read  to  the  nearest  0,01  percent  would  be  within  0o092  percent, 
and  99  percent  would  be  within  0,138  percent,,  The  standard  deviation  of  the  difference 
between  two  readings  which  would  be  expected  due  to  the  rounding  of  test  readings 
to  the  nearest  0o01  percent  would  be  +0o0047o  It  appears  that  rounding  was  a  minor 
factor  in  the  differences  between  readings  for  these  comparisons, 

Babcock  tests  are  almost  always  read  to  the  nearest  one-tenth  of  one  percent,, 
Each  reading  involves  a  maximum  error  of  +0,05,  due  to  rounding,,  If  the  tests  being 
rounded  are  distributed  uniformly  over  the  0ol  percent  interval,  or  from  o05  below  to 
,05  above  the  rounded  reading,  two-thirds  of  the  tests  would  be  included  in  the  interval 
from  ,033  below  to  ,033  above  the  rounded  reading.  Based  on  this  estimated  standard 
deviation  of  0,033  for  the  rounding  of  individual  test  readings,  the  standard  error  of  an 
average  of  30  daily  tests,  due  to  rounding,  would  be  +,006,  and  the  standard  deviation 
of  the  difference  between  two  tests  which  might  be  attributable  to  the  rounding  pro- 
cedure would  be  +_„047, 

The  question  of  a  tester's  personal  bias  in  the  reading  of  a  test  must  also  be 
considered.  A  small  amount  of  bias  will  not  affect  the  reading  of  each  individual 
test,  although  it  affects  the  average  reading  of  a  group  of  tests  by  the  amount  of  the 
bias  when  the  usual  rounding  procedures  are  followed,  if  the  tests  are  evenly  dis- 
tributed over  the  rounding  interval.  With  a  0,01  percent  bias,  an  average  of  1  in 
every  10  tests  would  be  expected  to  differ  by  0,1  percent  butterfat  (or  1  point)  after 
rounding  to  the  nearest  0,1  percent.  The  average  of  the  10  tests  would  then  be  changed 
by  0,01  percent  or  by  the  amount  of  the  bias,  A  bias  of  0,02  percent  would  be  re- 
flected by  a  0,1  percent  (or  1  point)  difference  on  2  in  every  10  readings.  The  average 
of    the     10    tests   would  then  be    changed   by  0,02  percent  or  by  the  amount  of  the  bias. 

The  effect  of  a  bias  on  butterfat  test  readings  and  averages  can  be  verified  by 
starting  with  a  series  of  10  true  readings  such  as  4,00,  4,01,  to  4,09,  rounding  them, 
and  comparing  the  rounded  percentages  and  their  average  with  the  rounded  percentages 
and  average  of  a  series  in  which  the  same  true  readings  have  had  the  bias  added 
(or  subtracted)  before  rounding  and  averaging.  For  example,  adding  a  0,01  percent 
bias  to  the  example  above  would  change  the  series  to  4,01,  4,02  to  4,10  and  would 
increase  the  average  of  the  rounded  percentages  from  4,04  to  4,05, 

Information  about   the   numbers    of  differences   of  a  given  size  is  more  important 
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than  average  differences.  Such  frequency  distributions  afford  a  basis  for  deciding 
when  the  disagreement  between  two  sets  of  results  covering  the  same  producers' 
milk  is  within  bounds  that  may  normally  be  expect edc 

Test  results  tend  to  differ  more  among  technicians  who  test  occasionally  than 
among  those  who  regularly  test  large  numbers  of  samples.  Also,  results  appear  to 
vary  more  among  testers  who  do  not  regularly  work  together,  or  who  work  under 
different  supervision,  than  among  testers  who  frequently  compare  results  and  method- 
ology or  have  the  same  supervision  (table  1  )„  In  one  report  it  was  suggested  that 
"ooopsychology  may  influence  a  teste  A  majority  of  testers  are  subject  to  influence 
and  suggestion0ooTesters  working  under  too  critical  scrutiny  may  readily  be  in= 
fluenced  by  the  attitude  of  employers0.c."  5/ 

In  7,192  comparisons  of  paired  results  on  identical  samples,  by  testers  ac- 
customed to  working  together,  74  percent  agreed,  25  percent  differed  by  1  point, 
and  1  percent  differed  by  2  points  or  more  (table  1  )e  In  the  6  experiments  with  4 
testers  in  each,  tests  agreed  on  69.4  to  81s9  percent  of  the  paired  samples  (table  2)« 
These  comparisons  indicate  rather  clearly  that  variations  in  measuring  and  reading 
the  Babcock  test  frequently  result  in  test  differences  of  one  point  on  identical  samples 
by  two  experienced  testers,  but  seldom  result  in  differences  exceeding  one  point* 
More  than  occasional  differences  larger  than  one  point  warrant  an  examination  of  the 
sampling  or  testing  procedures  used  by  the  testers„ 

Part  of  the  differences  among  testers'  results  on  duplicate  samples  is  due  to 
small   variations    in  techniques,,      Given  uniform  techniques,    some  differences  can  be 

Table  1. --Percentage   of"  duplicate  samples   of  milk  given  same  and  different  Babcock 
readings  by  two  testers,   by  working  relationship  of  testers 


Description 

'  Pairs  of, 
[duplicate* 
\    samples 

'Pairs  of 
tests  in 
agreement' 

Pairs 

Df  tests 

differing  by: 

of  testers 

1  point 

,2  points 

[Over  2  points 

Testers  working  closely  together: 
Tester  and  check-tester, 
1  market  l/ „ 

.  Number 

6,760 
432 

Percent 

74.2 
76.9 

Percent 

24.  6 
22.9 

Percent 

1.0 
.2 

Percent 
0.2 

Market  administrators'  testers, 
6  markets  2/  

0.0 

Total  

7,192 

74.4 

24.5 

•9 

.2 

Testers  not  accustomed  to  working 
together:                      \ 
31  technicians  from  various 
plants  3/ • 

1,846 
1,83^ 

54.1 
26.9 

36.2 
42.9 

7.9 
19.2 

1.8 

8  research  technicians  4/  

11.0 

Total  

3,680 

40.6 

39.5 

13.5 

6.4 

1/  Duplicate   samples   of  milk  taken  from  a   storage  tank  at  the  same  time  by  tester  and 
check-tester.     Most  of  the  testing  was  done  by  two  men. 

2/  6  experiments,   4  testers   in  each,    using  subsamples   of  12  samples. 

3/  Chiefly  plant  testers  at  refresher  course  at  the  University  of  Minnesota,   using 
subsamples   of  4  samples. 

4/  Technicians  who  did  occasional  testing;   each  from  a  different  agency. 


5/         Unpublished     report    of    a    refresher    course    for  butterfat  testers  held  at  the 
University  of  Minnesota,  Feb„  1958e 
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Table   2. — Percentage   of  duplicate   samples   of  milk  given  same  and  different  Babcock 

readings  by  two  testers,    6  experiments 


[      Pairs  of 
duplicate 
tests  1/ 

Pairs  of 

tests  in 

agreement  2/ 

Pairs 

of  tests  differing  by: 

Experiment 

1  point 

\      2  points 

Over  2  points 

Number  1  

Number  2  

Number  3  

Number  k    

Number  5  

Number  6  

Number 

72 
72 
72 
72 
72 
:     72 

Percent 

81.9 
81.9 
77.8 
76.1+ 
73.6 
69.1+ 

Percent 

18.1 
18.1 
22.2 
23.6 

25.0 
30.6 

Percent 

0 
0 
0 
0 

l.k 

0 

Percent 

0 
0 
0 
0 
0 
0 

Total  

!    ^32 

76.9 

22.9 

.2 

0 

1/  In  each  experiment,   k  operators  tested  12  samples;    this  affords  72  comparisons   of 
readings  by  two  testers. 

2/  Tests  differing  by  from  -.05  to  +  .05  were   considered  to  be   in  agreement. 

controlled  by  improving  testers'  skills  and  care.  In  the  present  study  several  ex- 
periments were  made  to  determine  how  much  of  the  variation  was  caused  by  differences 
in  pipetting  the  sample  into  the  test  bottle  and  by  differences  among  testers  in 
reading  the  completed  test. 

In  each  of  five  markets  tests  on  duplicate  samples  were  prepared  by  two  pipetters. 
and  read  independently  by  two  readers.  Ina  sixth  market,  tests  on  samples  in  6  experi- 
ments   were   prepared  by  one  pipetter  and  read  independently  by  from  5  to  1  5  readers. 

Differences  between  testers  in  pipetting  samples  caused  highly  significant 
differences  in  testers'  results  in  three  of  five  experiments.  In  the  other  two  ex- 
periments, differences  were  too  small  to  be  significant  (table  3,  markets  1,  2,  5,  6,  8). 
In  these  five  experiments,  differences  among  testers  in  reading  tests  were  highly 
significant  in  only  one  trial.  In  each  of  the  6  experiments  in  which  one  man  pipetted 
and  prepared  tests  and  several  technicians  read  each  test  result,  differences  among 
the  individual  readers  were  highly  significant  (table  3,  Market  9). 

The  estimated  amount  of  test  variance  which  could  be  eliminated  by  using  only 
one  pipetter  ranged  from  0  to  29  percent.  Three  out  of  five  of  the  experiments  pro- 
duced estimates  of  1 9  percent  or  less.  That  part  of  the  variance  which  could  be  ex- 
pected to  be  eliminated  by  using  only  one  reader  ranged  from  0  to  41  percent.  Seven 
of  11   experiments  yielded  estimates  of  21  percent  or  less  (table  4). 


Pipetting  Temperatures 

It  has  been  a  common  practice  for  testers  to  pipette  fresh  samples  at  70°  F. 
and  composite  samples  at  100°  F.  despite  the  fact  that  the  Association  of  Official 
Agricultural  Chemists  specifies  100°  F.  as  the  standard  pipetting  temperature  for 
both  kinds  of  samples  (j_2)  .  In  fact,  pipetting  temperatures  recommended  in  State 
regulations  have  varied  widely  (table  5). 

Previous  work  has  shown  that  the  tests  of  fresh  samples  give  results  signi- 
ficantly higher  than  those  of  composite  samples  (10^  _17)  .  Theoretically,  a  high 
pipetting    temperature    could    cause    a    lower    test    than    a  low  pipetting  temperature, 
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Table  4. — Variance  in 

test 

results  due  to  differences  in 
reading  the  Babcock  test 

pipetting 

milk  samples  and  in 

Item 

!          2 

readers, 

2  pipetters,  12  samples  each  in — 

Market  1 

[Market  2 

\ Market  5 

[Market  6 

\ Market  8  | 

Within-sample  variance 
due  to:                 : 
Reading 

Fat  % 

0.0006 
.0000 
.0023 

Fat  i 

0.0000 
.0006 
.0026 

Fat  i 

0 . 0008 
.0004 
.0026 

Fat  % 

0.0002 
.0010 
.0023 

Fat  cIl 

0.0002 
.0010 
.0028 

Pipetting 

Chance ■ 

Total 

.0029 
Percent 

20.7 

0 

.0032 
Percent 

0 
18.8 

.0038 
Percent 

21.1 
10.5 

.0035 
Percent 

5.7 
28.6 

.0040 
Percent 

5.0 
25.0 

Estimated  percentage 
variance  which  could 
eliminated  by  using 
1  rpadpT. .......... 

be 
Dnly: 

1  pipetter ' 

2    < 

Bach  read  by — 

\  5  readers 

"7  readers 

[8  readers 

[8  readers] 15  readers 

Within-sample  variance 
due  to: 
Reading 

Fat  % 

0.0003 

.0013 

Fat  °io 

0.0006 

.0017 

Fat  % 

0.0007 
.0010 

Fat  cic 

0.0003 

.0021 

Fat  %            Fat  % 

0.0002    0.0005 
.0018     .0012 

Chance : 

Total  

.0016 
Percent 

18.8 

.0023 
Percent 

26.1 

.0017 
Percent 

41.2 

.0024 
Percent 

12.5 

.0020     .0017 
Percent   Percent 

10.0      29.4 

Estimated  percentage 
variance  which  could 
eliminated  by  using  < 
1  reader 

be   : 

3nly  : 

because,  with  volume  constant  at  17Q6  ml.,  a  smaller  weight  of  warm  milk  than  of 
cold  milk  would  be  delivered  into  the  test  bottleG  This  would  result  in  a  smaller 
quantity  of  fat  in  the  neck  of  the  Babcock  test  bottle  and  a  lower  butterfat  test  reading. 
Lower  viscosity  at  high  temperatures  might  offset  the  change  in  volume,  by  causing 
less  fat  to  adhere  to  the  walls  of  the  pipette0  Consequently  more  complete  delivery  of 
the  pipetted  sample  into  the  test  bottle  would  occur  than  at  lower  temperature sc  One 
purpose  of  this  work  was  to  determine  whether  pipetting  temperatures  are  a  source 
of  downward  bias  in  composite  tests. 

In  some  laboratories  fresh  samples  are  pipetted  with  no  attempt  to  standardize 
the  pipetting  temperature.  Differences  in  pipetting  temperatures  that  could  occur  in 
the  absence  of  standardizing,  theoretically,  could  cause  day-to-day  variation  in  a 
producer's  tests  and  explain  part  of  any  differences  between  tests  on  individual 
samples,  and  between  plant  tests  and  check  tests.  This  suggested  a  second  purpose 
of  the  work  on  pipetting  temperatures,  to  determine  whether  differences  in  fresh 
tests  occurring  under  the  usual  range  of  temperatures  found  in  plants  and  laboratories 
would  be  significant. 
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Table  5. — Pipetting  temperatures  for  milk  samples  specified  in  testing  procedures 

required  in  various  States,  1953  l/ 


Pipetting 
temperatures  (°F. 


Number  of  States 


Pipetting 
temperatures  (°F.) 


Number  of  States 


50°  -  70° 
50°  -  100° 
55°  -  65° 
55°  -  70° 
60°  -  68° 
60°  -  70° 
60°  -  100° 
65°  -  75° 

68°  

70°  -  95° 


85°  -  100°  . . . 

90° , 

95°  -  100°  .., 

100° , 

Cool  to  70°.., 

About  70°  

Not  over  110°, 

Warm  , 

Not  specified 


1 
1 
2 
3 
1 
h 
1 
1 
Ik 


l/  (^>  P«  13.)   Apparently  these  pipetting  temperatures  apply  to  both  fresh  and 

composite  samples. 


fat 


Work  by  the  U.  S0  Bureau  of  Standards  shows  that  if  milk  with  4,0  percent  butter= 
is  assigned  a  volume  of  1.0  at  68°  F.a  the  volume  would  be  1.0020  at  80°  F., 
lo0040  at  90°  F0,  and  1.0065  at  100°  F.  (21_).  This  difference  in  volume  results  in  an. 
amount  of  fat  delivered  into  the  testing  bottle  at  100°  F,  equal  to  99«35  percent  of  the 
amount  delivered  at  68°  F.,  assuming  that  surface  tension  would  be  equal  at  both 
temperatures,  Since  the  surface  tension  of  milk  is  lessened  at  higher  temperatures, 
it  is  probable  that  the  weight  of  milk  delivered  at  68°  and  100°  would  be  closer  than 
the  relation  indicated  above  (20)o 


The  effect  of  pipetting  temperature  on  test  results  has  been  measured  in  several 
experiment  So  Wilster  and  Robichaux  found  that  there  was  no  difference  between  the 
averages  of  tests  on  12  samples  pipetted  at  68°  F.  and  at  80°  F„  A  pipetting  tem- 
perature of  100°  F.  gave  an  average  fat  reading  which  was  0.05  percent  lower  than 
that  for  68°  F0,  and  120o  F.  gave  an  average  fat  reading  which  was  0.08  percent 
lower  than  that  for  55°  F.  (21)..  Dahlberg  found  the  average  fat  reading  of  tests  on 
6  samples  pipetted  at  120°  F\Twas  0.01  percent  higher  than  the  average  of  tests  on 
the  same  samples  pipetted  at  70°  F.  {3)  „  Bailey  found  that  the  average  weight  of 
milk  delivered  at  70°  F.  was  17.937  grams  and  at  115°  F,  was  0.123  gram  less.  He 
stated  that  "on  the  average  reading  of  4.51  percent  this  would  amount  to  0.046 
percent  "  (2).  A  bias  of  only  0.02  percent  butterfat,  however,  could  cause  a  difference 
of  1  point  in  20  percent  of  tests.  Therefore,  it  seemed  necessary  to  carry  out  addi= 
tional  work  on  the  effect  of  pipetting  temperatures  on  test  results. 

In  order  to  obtain  clear  evidence  on  the  effect  of  pipetting  temperature  on  test 
results,  185  fresh  and  composite  samples  from  five  markets  were  each  pipetted  at 
five  different  temperatures,  60°,  70°,  80°,  90°,  and  100°  F.9  and  each  subsample  was 
tested.  The  results  of  these  tests  for  all  markets  were  pooled  and  analyzed  to  deter- 
mine if  temperature  of  pipetting  has  an  effect  on  the  fat  test. 

Although  statistically  significant  differences  frequently  occurred  among  the  tests 
on  the  samples  pipetted  at  each  of  the  five  pipetting  temperatures  and  between  the  tests 
on    the    sample   pipetted   at    70°   and    100°    F.,  there  was  no  consistent  tendency  for  the 
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higher  pipetting  temperature  to  give  lower  or  higher  tests  (tables  6  and  70) 

Both  of  the  experiments  made  in  Market  9  showed  a  consistent  and  highly  signi= 
ficant  inverse  relation  between  pipetting  temperatures  -and  test  results  (table  7)»  No 
other  market  had  results  that  showed  a  consistent  and  firm  relationship.,  This  suggests 
that  a  relationship  between  pipetting  temperatures  and  test  results  might  be  difficult 
to  establish  under  industry  conditions,, 

Storing  and  Reheating  Composite  Samples 

Thirty- six  States  require  that  dairy  plants  retain  composite  samples  after  the  end 
of  the  compositing  period  for  times  ranging  from  1  to  12  days  (9)o  The  holding  time 
affords  regulatory  agencies  an  opportunity  to  verify  the  accuracy  of  dealers' tests  on 
such  samples.  Some  groups  have  objected  to  the  regulations  on  the  basis  that  butter- 
fat  samples  deteriorate  with  time  so  that  results  of  check-tests  made  after  the  com- 
positing period  are  inaccurate. 

Demonstration  that  either  storage  time  or  heating  and  cooling  significantly 
affect  the  level  of  test  results  would  furnish  an  objective  basis  for  reconsidering 
regulations  required  for  storage  of  composite  samples  after  the  end  of  the  compositing 
period,  or  for  establishing  tolerances  between  results  of  check=tests  and  the  initial 
composite  tests. 

Studies  comparing  results  of  tests  on  fresh  samples  with  results  of  tests  on  7-day, 
10-day,  and  15-day  composites  showed  that  test  results  for  composite  samples  tended 
to  be  lower  than  the  average  of  results  for  fresh  samples  for  the  same  day,  and  that 
the  spread  between  fresh  and  composite  tests  tended  to  increase  with  the  number  of 
days  in  the  compositing  period  (10,  17)0  This  suggests  that  storage  time  may  affect 
results  of  the  Babcock  tests. 

After  the  original  test  is  made  on  a  composite  sample  within  24  hours  after  the 
end  of  the  compositing  period,  samples  to  be  held  for  retesting  are  immediately 
cooled  and  stored  in  a  refrigerator0  Before  a  second  test  can  be  made,  the  sample 
must  be  taken  out  of  storage  and  reheated  to  the  appropriate  pipetting  temperature 
(95°-100°  F0  for  all  tests  in  these  experiments). 

In  order  to  find  out  the  effects  cf  reheating  a  composite  sample  for  a  retest  after 
the  compositing  period,  eight  controlled  experiments  were  set  up  in  four  markets,, 
In  each  of  these  experiments  milk  samples  were  obtained  for  24  producers,  and 
composite  samples  were  prepared  for  each  producer  After  testing  at  the  end  of  the 
compositing  period  (treatment  A),  the  remainder  of  each  original  composite  sample 
was  divided  into  3  parts  which  were  cooled  and  stored  for  retesting  as  follows: 
reheated  once  for  testing  after  1  day  (treatment  A]J;  2  days  (treatment  Bi);  and  5 
days  (treatment  Ci)0  The  first  subsample  was  cooled  and  reheated  a  second  time 
for  testing  after  2  days  (treatment  B2),  and  cooled  and  reheated  a  third  time  for 
testing  after  5  days   (treatment  C3), 

Tests  of  composite  samples  made,  1,  2,  and  5  days  after  the  end  of  the  compositing 
period  were  appreciably  lower  in  butterfat  than  tests  on  the  same  samples  at  the  end 
of  the  compositing  period  (tables  8  and  9),  About  95  percent  of  the  tests  made  1  day 
after  the  end  of  the  compositing  period  (treatment  A^ )  were  equal  to  or  less  than  the 
tests  at  the  end  of  the  compositing  period,  as  were  93  percent  of  the  tests  made  2 
days  after  the  end  of  the  compositing  period  (treatments  Bj  and  B2),  and  99  percent 
of  the  tests  made  5  days  later  (treatments  Ci  and  C3), 
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Table  8. — Average  difference   in  butterfat  test   of  composited  milk  samples,    by  days  held 

and  times  reheated 


Days 
'.    held 

Average 
compos 

i   difference  from  orij 
>ite  test  (treatment  i 

?inal 

Treatment 

All 

composites 
2/ 

7-day  composites 
2/ 

;io- 

-day  composites 
2/ 

: Number 

Butterfat  -percent 

-0.03  r 

-  .04  rs 

-  .07   t 

-  .05  st 

-  .13    u 

Butterfat  -percent 

-0.02  r 

-  .01*r 

-  .06  r 

-  .02*r 

-  .15  s 

Bu- 

bterfat  percent 

Reheated  once: 

An  

1 
2 
5 

2 

5 

-0.03  r 
-  .05  s 

1 
Bi  

cl 

-  .07   t 

Reheated  twice: 

B2  

-  .07   t 

-  .13    u 

Reheated  three  times: 
Co  

j  

"  -   "u"   indicate  statistical   significance.      Average  difference  followed 
1   is   significantly  different  from  those  differences  in  the  same   column  not 


1/  Letters 
by  letter   Mr 

having   "r";    those  followed  by   "s"   are   significantly  different  from  those  not  having 
"s",    etc. 

2/  Each  average  difference  has  been  shown  by  a   t-test  to  be  very  highly  significant 
(except  for  the   2  in  7-day  column  marked  with  asterisks  to  indicate  no  significance) 
that  is,    on  the  average,    the  test   on  each  sample   tested  after  the  end   of  the  compos- 
iting period  was  lower  than  the   original   composite  test  by  an  amount  which  could  be 
expected  to  occur  in  not   over  1  percent   of  the  trials  due  to  chance  alone. 


The  number  of  times  a  sample  was  reheated  was  a  more  important  factor 
than  the  length  of  storage.  Of  the  three  sets  of  tests  made  with  only  one  reheating, 
only  those  held  5  days  (treatment  Cl)  were  significantly  different  from  the  other 
two  sets.  On  the  average,  treatment  Cl  resulted  in  butterfat  tests  about  0,07  percent 
lower  than  the  original  composite  (treatment  A)  (table  8)0 

The  samples  held  for  1  day  (treatment  Aj_ )  and  those  held  for  2  days  (treatment 
Bi)  averaged  lower  than  the  original  composite  by  0„03  and  0o04  percent,,  They  were 
close  enough  to  each  other,  however,  to  represent  differences  which  had  a  high 
probability  of  occurrence  due  to  chance,  and  the  effects  of  the  two  treatments  could 
not  be  considered  to  be  different. 

The  tests  made  on  samples  held  for  2  days  and  reheated  twice  (treatment  B2) 
averaged  0o05  lower  than  the  original  composite  test.  This  is  not  significantly 
different  from  the  average  for  treatment  Bi,  0,04,  held  the  same  length  of  time  but 
reheated  only  once,  or  from  the  average  of  treatment  Cl,  0,07,  held  5  days  but  re- 
heated only  once. 

The  samples  differing  the  most  from  the  original  composite  tests  were  those 
held  5  days  and  reheated  three  times,  (treatment  C3),  Their  average  difference  of 
0o13  was  significantly  lower  than  the  .differences  for  any  of  the  other  four  types  of 
treatments  (table  8)„ 

The  downward  effect  on  test  results  of  reheating  suggests  that  allowing  composite 
samples   to    stand   at    room  temperature  during  any  part  of  the  compositing  period  is  a 
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possible    cause    of    downward    bias    in   composites    as    compared  with  results  on  fresh 
sample  So 

The  averages  in  table  8  afford  check-testing  agencies  some  measure  of  the 
tolerances  appropriate  when  check-testing  is  delayed  after  the  compositing  period 
has  ended. 


THE  BLENDING  OF  CAN  MILK  IN  PLANT  WEIGH  TANKS 

In  most  plants  receiving  milk  in  cans,  samples  for  Babcock  and  other  tests  are 
taken  from  the  weight  tank  as  the  cans  from  each  producer's  delivery  are  dumped 
and  weighed.  Should  the  milk  be  inadequately  blended  in  the  weigh  tank  before  the 
sample  is  taken,  it  may  not  be  representative  of  the  producer  s  total  delivery,, 

Weigh  tanks  may  vary  considerably  in  their  ability  to  blend  milk.  Samples 
taken  at  any  one  place  in  the  tank  may  not  be  representative  of  the  entire  contents 
of  the  tank.  In  markets  where  plants  account  to  producers'  organizations  or  market 
administrators  for  milk  intake  and  fat  tests,  weigh  tanks  usually  are  required  to  meet 
standards  for  mixing  ability,,  Nevertheless,  under  normal  operating  conditions, 
differences  occur  among  weigh  tanks  in  their  blending  of  milk  and  these  can  affect 
butterfat  test  results,,  The  amount  of  such  variability  could  cause  significant  differences 
in  results  from  two  samples  of  the  same  milk,  where  each  was  taken  from  a  different 
place  in  the  weigh  tank  (17,  1_,  14,  15,  L6,  19), 

This  consideration  led  to  analysis  of  the  blending  ability  of  weigh  tanks  used  in 
eight  plants  where  butterfat  tests  were  made  for  the  present  study,,  For  this  limited 
study  of  blending  ability,  one  series  of  samples  was  taken  from  the  place  in  the  tank 
that  the  plants  ordinarily  used,,  Samples  were  also  taken  from  one  to  four  other 
places  in  the  tankse  Test  results  for  the  samples  from  each  position  in  the  tanks 
were  compared  and  were  analyzed  statistically  to  determine  whether  differences 
were  greater  than  could  be  expected  by  chance  alone. 

For  weigh  tanks  where  some  method  of  agitation  was  followed  to  improve  the 
blending  ability  of  the  tanks  before  sampling,  from  0  to  36  percent  of  the  samples 
from  one  position  differed  in  test  "by  more  than  1  point  from  samples  from  another 
position  (table  10)„  These  percentages  varied  sharply  among  tanks  with  the  same 
method  of  agitation,  (Plants  1  and  4;  12  and  14).  One  weigh  tank  showed  a  relatively 
high  proportion  of  tests  (36  percent)  differing  by  more  than  1  point,  and  the  largest 
average  difference  between  positions,  0.0764,  Statistically,  this  average  difference 
was  not  significantly  different  from  a  zero  difference,  and  did  not  represent  a  "bias 
between  the  two  positions  in  the  tank.  The  average  difference,  though  large,  could 
not  be  considered  significant  because  it  is  not  larger  than  one  would  expect  on  the 
basis  of  the  variation  in  the  size  of  the  individual  differences,,  6/ 

In  5  plants  weigh  tanks  were  sampled  without  previously  agitating  the  milk. 
For  these  tanks,  from  10  percent  to  50  percent  of  the  tests  on  samples  from  one 
position  differed  by  more  than  1  point  from  those  on  samples  from  a  second  position 
(table  10), 


6/  In  this  study  the  designation,  "significant  difference",  means  that  differences 
as  large  or  larger  than  the  one  occurring  would  be  expected  by  chance  alone  in  not 
more  than  5  percent  of  repeated  trials.  "Highly  significant  difference"  means  that 
differences  as  large  or  larger  than  the  one  occurring  would  be  expected  by  chance 
alone  in  not  more  than  1  percent  of  repeated  trials. 
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In  one  plant  where  the  experiment  was  done  first  without  agitation,  a  second  trial 
was  made,  taking  samples  after  hand  agitation  (Plant  14)0  After  agitation,  33  percent 
of  the  differences  were  greater  than  1  point  and  the  average  difference  of  0.0333  was 
not  significant;  without  agitation,  50  percent  of  the  differences  were  greater  than  1 
point  and  the  average  difference  of  -0,0386  was  significantly  different  from  a  zero 
average  difference. 

When  tests  on  samples  from  two  positions  are  in  fact  equivalent  except  for  random, 
or  chance,  variations  in  sampling  and  testing,  the  average  difference  in  the  paired  tests 
can  be  expected  to  fluctuate  around  zero.  Average  differences  obtained  were  tested  by 
a  t-test  to  determine  the  probability  of  occurrence  due  to  chance,  on  the  basis  of  the 
variation  of  the  individual  differences,  of  an  average  as  large  as  or  larger  (and,  there- 
fore, as  different  from  zero)  than  the  one  obtained»  When  the  probability  is  5  percent 
or  less,  the  average  difference  is  considered  to  be  significantly  different  from  zero  and 
to  represent  a  "bias"  between  the  two  positions  being  compared.  The  two  positions 
being  compared  differed  significantly  in  two  of  the  four  experiments  where  samples 
were  taken  without  agitating  the  milk,  and  without  the  use  of  the  "milk  thief,"  and  in 
none    of    the   five   trials    where    samples    were   taken  following  some  form  of  agitation. 

The  percentage  of  samples  from  two  positions  disagreeing  by  more  than  one  point 
is  probably  a  better  measure  of  the  mixing  ability  of  weigh  tanks  than  the  significance 
of  the  average  difference»  Plus  and  minus  differences  between  samples  taken  from 
different  positions  may  balance  each  other  so  that  the  average  difference  does  not 
represent  a  statistically  significant  "bias,"  but  either  plus  or  minus  differences  of 
over  one  point  would  reflect  incomplete  mixingo  When  milk  has  been  thoroughly 
blended,  there  is  no  reason  to  expect  tests  on  samples  from  two  positions  to  vary  more 
than  do  two  tests  by  one  tester  on  a  single  sample.  Two  tests  by  the  same  tester  on 
one  sample  of  milk  can  be  expected  to  differ  by  more  than  one  point  in  less  than  2 
percent  of  the  comparisons.     (See  pages  9  and  10.) 

PROBLEMS  RELATED  TO  PICK  UP  OF  MILK  IN  BULK  TANKS 

The  widespread  adoption  of  bulk  handling  of  milk  has  brought  with  it  a  need  to 
develop  a  system  for  sampling  and  testing  milk  that  protects  both  plant  and  producer, 
at  an  acceptable  cost,  against  added  variability  inbutterfat  tests.  This  concern  refers 
to  variations  other  than  the  inherent  day-to-day  change  in  butterfat  content.  Under  the 
bulk  tank  system  of  hauling,  farm  milk  is  commingled  at  the  farm  instead  of  the  plant. 
This  makes  it  necessary  to  take  milk  samples  for  butterfat  testing  at  each  farm,  before 
the  milk  is  pumped  from  the  farm  tank  into  the  tank  truck.  Bulk  tank  milk  usually  is 
sampled  in  one  of  two  ways:  (a)  A  sampler  rides  the  tank  truck  and  takes  a  sample  at 
each  farm,  or  (b)  the  driver  of  the  truck  takes  the  sample. 

The  first  way  of  sampling  is  expensive  and  usually  impractical  because  one 
sampler  can  visit  relatively  few  farms  daily.  It  is  doubtful  if  many  plants  would 
consider  taking  samples,  especially  daily  samples  for  compositing,  in  this  high-cost 
fashion.  On  the  other  hand,  it  may  be  necessary  and  practical  to  obtain  random  fresh 
samples  for  check-testing  in  this  way. 

The  least  costly  way  to  sample  milk  in  farm  bulk  tanks  is  to  have  the  hauler  take 
the  sample.  However,  the  representativeness  of  the  sample,  if  taken  in  this  way,  may 
be  questioned  on  the  basis  that  the  hauler  is  not  necessarily  a  skilled  sampler,  that  he 
may  be  careless  in  taking  samples,  or  he  even  may  be  suspected  of  deliberately  taking 
unrepresentative  samples.  In  some  markets,  haulers  are  required  to  pass  State  tests 
and  be  licensed  as  milk  samplers  in  order  to  ensure  that  they  understand  and  can  follow 
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acceptable  methodology  in  taking  samples.  However,  licensing  a  sampler  does  not 
ensure  that  he  will  sample  correctly  so  a  need  persists  to  afford  producers  and  plants 
assurance  against  erroneous  sampling,. 

Samples  also  may  become  unrepresentative  through  damage  during  the  transport 
tation  from  farm  to  plant,,  Samples  may  churn  or  freeze,  so  that  composite  samples 
built  at  the  plant  are  unrepresentativeQ  Most  bulk  tank  trucks  have  insulated  sample 
compartment  So  Sample  bottles  also  are  iced  to  minimize  the  possibility  of  damage  to 
samples  during  the  warmmonths,  However,  even  with  these  safeguards  there  is  chance 
of  damage  to  the  samples. 

Plants  and  check-testing  agencies  have  two  concerns  regarding  sampling  of  bulk 
milk  at  the  farm:  (1)  They  need  a  way  to  check  a  hauler's  sampling  without  incurring 
the  expense  of  an  official  sampler  on  each  bulk  tank  truck;  (2)  They  need  to  determine 
which  possible  causes  of  unrepresentative  samples  result  in  differences  so  small  that 
they  can  be  ignoredo 

The  following  studies  were  made  to  assemble  information  about  these  problems 
and  to  evaluate  ways  which  have  been  suggested  or  used  to  meet  them. 


Sampling  from  Bulk  Tank  Trucks 

One  way  to  minimize  the  cost  of  checking  the  hauler's  sampling  and  the  plant's 
testing  of  bulk  tank  milk  is  to  use  samples  taken  by  the  tank  truck  drivers  and  samples 
from  the  loaded  tank  truck  at  the  plant0  Samples  taken  from  the  loaded  tanker,  if 
representative  of  the  milk  in  the  tanker,  presumably  contain  the  percentage  of  fat 
equal  to  the  average  of  the  fat  tests  of  all  producers  whose  milk  is  in  the  tank, 
weighted  by  the  pounds  of  milk  each  one  delivered,,  Therefore,  if  the  fat  test  of  the 
tanker  milk  equals  the  weighted  average  of  all  producers'  tests  determined  from  the 
driver's  samples,  within  appropriate  tolerances,  the  check-testing  agency  would 
assume  that  the  tests  of  the  individual  producers'  milk  were  accurate  and  the  drivers 
samples  were  representative0  This  assumption  does  not  rule  out  compensating  errors, 
since  a  low  test  for  one  producer  might  be  balanced  by  a  high  test  for  another,,  How- 
ever, statistically  determined  tolerances  afford  a  testing  laboratory  a  most  helpful 
guide  for  detecting  improper  sampling  of  individual  producers' milk0  These  tolerances 
would  represent  expectations  based  on  results  obtained  by  unbiased  testers  working 
under  normal  commercial  laboratory  conditions,,  The  method  appears  to  afford  a 
possibility  of  maintaining,  at  low  cost,  a  constant  check  on  the  accuracy  of  sampling 
and  testing  when  it  is  used  in  conjunction  with  periodic  check-testing  of  individual 
producer's  samples.  In  1  957  this  practice  was  followed  in  eight  Federal  Order  markets, 

This  method  of  verification  presupposes  knowledge  of  the  differences  to  be 
expected  in  test  results  between  samples  of  milk  taken  properly  from  bulk  tank 
trucks  and  the  weighted  averages  of  tests  on  proper  samples  of  the  individual 
producers'  milk„ 

Because  the  representativeness  of  the  sample  obtained  from  the  loaded  tank  truck 
may  vary  from  one  sampling  method  to  another,  the  amount  and  dispersion  of  dif- 
ferences can  vary.  The  amount  of  agitation  given  to  the  milk  in  the  tank  truck  would 
be  expected  to  affect  the  representativeness  of  the  sample. 

A  study  was  made  to  measure  the  amount  of  differences  between  the  weighted 
averages  of  tests  on  samples  from  individual  producers'  milk  and  tests  on  samples 
of  their    commingled   milk   from   bulk   tank  trucks,,     These  experiments,  made  in  three 
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different    markets,     compared    the     representativeness    of    results    from    sampling  in 
several  ways  both  with  and  without  agitating  the  tank  load  before  sampling,. 

The  agreement  between  the  test  of  samples  from  a  bulk  tank  truck  and  the 
weighted  average  of  tests  of  producers'  samples  was  close  when  the  tank  load  of 
milk  had  been  agitated  (experments  1  and  4,  manhole  sampling,  and  experiment  4, 
inline  sampling  using  automatic  positive  periodic  sampler  and  samples  from  the 
plant's  holding  tank,  table  11)0 

Since  a  tank  load  of  milk  is  agitated  at  least  partially  by  pumping  producers' 
milk  into  the  tank  truck,  the  time  between  pumping  the  last  milk  into  the  tank  truck 
and  the  sampling  tended  to  affect  the  agreement  between  the  test  of  the  commingled 
milk  in  the  tank  truck  and  the  weighted  average  of  producers'  tests  when  no  further 
agitation  was  given  the  milk  in  the  tanker  before  sampling0  This  can  best  be  seen  by 
comparing  the  differences  for  experiment  5  with  those  for  the  other  experiments  when 
there  was  no  agitation0  However,  among  individual  tank  loads,  the  relation  of  time  to 
agreement  differed  widely,,  Undoubtedly  factors  such  as  size  of  fat  globule  and  vis- 
cosity of  milk  in  individual  loads,  which  affect  creaming  time,  and  condition  of  roads, 
which  could  affect  "surge"  and  therefore  agitation,  and  the  volume  of  the  last  pick-up 
in    relation    to    the    volume    of  the   milk   already  in  the  tank,  modify  this  relationshipQ 

The  size  and  dispersion  of  differences  between  tests  on  milk  from  the  tank  trucks 
and  the  weighted  average  of  producers'  tests  varied  considerably  from  one  method  of 
sampling  to  another,,  In  experiment  2,  with  no  agitation,  the  manhole  samples  tended 
to  test  about  a  half  point  high  because  the  milk  had  started  to  cream.  A  high  proportion 
of  samples  from  the  valve  at  the  bottom  of  the  tank  tested  very  low  in  this  experiment, 
because  the  milk  had  started  creaming. 

The  effect  of  creaming  is  shown  more  definitely  by  the  valve  samples  in  experi- 
ment 50  Most  samples  taken  at  the  beginning  and  during  the  middle  of  the  unloading 
tested  low;  while  a  large  proportion  of  the  samples  taken  at  the  end  of  the  unloading, 
and  therefore  from  the  top  of  the  tank,  tested  very  high,, 

In  experiment  4,  all  sampling  methods  agreed  closely  (table  11)0  This  maybe 
explained  by  the  short  time  lapse  between  pumping  the  milk  last  picked  up  into  the 
tank  truck  and  the  sampling  timec 

The  average  size  and  direction  of  differences  for  each  sampling  method,  with 
notation  as  to  agitation,  also  are  shown  in  table  11,  and  the  methods  which  resulted 
in  statistically  significant  biases  from  the  weighted  averages  of  producers  tests 
are  identified,,  The  fairly  wide  average  difference,  -0.0215  percent  butterfat,  for 
manhole  sampling  without  agitation  in  experiment  1,  was  not  statistically  significant 
because  the  differences  for  the  five  individual  tankloads  varied  so  much  that  an 
average  difference  this  large  or  larger  had  a  probability  of  about  25  percent  of 
occurring  on  the  basis  of  chance.  On  the  other  hand,  in  experiment  3b  an  average 
difference  of  about  the  same  size,  -0.0207,  was  statistically  significant  because  the 
differences  were  consistently  below  the  average  of  the  individual  producers'  tests. 
This  consistent  difference  in  one  direction  would  have  less  than  a  1  percent  probabil- 
ity of  occurring  on  the  basis  of  chance  alone. 

The  results  of  these  experiments  indicate  that  reliably  representative  samples 
can  be  obtained  from  loaded  tank  trucks  only  if  the  milk  has  been  agitated  before 
sampling,  regardless  of  the  method  of  sampling.  In  some  circumstances  the  amount 
of     agitation     afforded     by    pumping    producers'    milk    into    the  tank  my  be  sufficient. 
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In  general  the  use  of  loaded  tanker  testing  as  a  check-test  appears  to  have 
certain  useful  applications,,  (1)  Study  of  table  11  shows  that  when  a  tank  load  of  milk 
has  been  agitated,  its  test  can  be  expected  to  agree  within  1  point  of  the  weighted 
average  of  tests  of  individual  producers'  milk  in  the  tanker,  in  over  95  percent  of 
trialso  Therefore,     a    difference    as    large    as     0,2  would  be  expected  in  less  than  5 

percent  of  tank  loads,,  (2)  For  any  bulk-tank  route  it  is  possible  to  make  a  frequency 
distribution  of  differences  between  the  loaded  tanker  test  and  the  average  of  producers' 
tests,  over  a  number  of  comparisons,,  A  frequency  distribution  of  this  kind  could  be 
compared  with  results  shown  in  table  1 1  to  determine  whether  the  results  were  in 
reasonable  agreement,  For  example,  on  the  basis  of  tests  on  samples  taken  from 
tanker  manholes,  after  agitation,  for  36  tank  loads  in  experiments  1  and  4,  the  tank 
sample  can  be  expected  to  test  within  0o04  percent  of  the  weighted  average  of  producers 
tests  in  about  95  percent  of  the  trialsQ  The  probability  that  a  test  will  differ  from  the 
weighted  average  by  0o04  to  0o09  is  between  4  and  5  percent,,  A  difference  of  one  point 
or  more  would  be  expected  in  less  than  1  percent  of  the  comparisons,, 


Adding  Preservatives  to  the  Farm  Sample 

At  the  time  this  study  was  undertaken  ,  it  was  the  practice  in  one  of  the  markets 
to  add  4  drops  of  a  36-percent  mercuric  chloride  solution  to  Babcock  test  bottles  as 
a  preservative  before  pipetting  the  duplicate  samples,  in  case  a  retest  became 
necessary,  This  eliminated  the  development  of  a  sour  smell,  (that  is,  prevented 
bacteriological  deterioration  of  the  sample)  and  made  the  test  bottles  easier  to  clean 
when  the  duplicate  was  held  several  days. 

The  1955  edition  of  the  Official  Methods  of  Analysis  of  the  Association  of  Official 
Agricultural  Chemists  recommends  that  a  "Tablet  containing  HgCl2  (mercuric 
chloride),  K2Cr207  (potassium  dichromate),  or  other  suitable  preservative,  weighing 
not  more  than  0,5  gram  for  8  fluid  ounces  of  milk,  or  36  percent  solution  of  HCHO 
(formaldehyde),  0,1  milliliter  (2  drops)  per  fluid  ounce,  may  be  used,,,"  (12),  An 
ounce  of  milk  is  about  30  ml,,  and  the  Babcock  test  requires  17,6  ml,  of  milk.  There- 
fore the  use  of  4  drops  of  solution  with  each  pipetted  sample  gives  a  much  larger 
amount  of  solution  per  unit  of  milk  than  is  recommended  for  composites. 

Indications  from  previous  research  are  that  composite  samples  tend  to  test 
lower  than  fresh  samples.  It  was  not  known  whether  the  use  of  excess  amounts  of 
K2Cr20y  would  affect  test  results.  For  this  reason,  the  present  study  was  under- 
taken to  determine  if  pipetted  samples  to  which  four  drops  of  a  preservative  had 
been  added  would  test  significantly  different  from  samples  to  which  no  preservative 
had  been  added.  This  analysis  was  made  on  duplicate  fresh  samples  taken  from  50 
farm  bulk  tanks. 

Tests  on  samples  to  which  preservatives  had  been  added  averaged  0,009  percent 
butterfat  above  tests  on  corresponding  samples  without  preservatives.  On  the  basis 
of  the  variance  of  the  individual  sample  differences,  an  average  as  large  as  or 
larger  than  0,009  had  a  probability  of  occurrence  of  from  20  to  30  percent  and  would 
not  indicate  a  significant  difference  between  the  samples  with  preservative  and  those 
with  no  preservative.  Tests  on  86  percent  of  the  paired  samples  agreed,  and  the 
remaining  1 4  percent  differed  by  1  point.  This  is  very  close  agreement  with  the  average 
differences  on  identical,  or  split,  samples  of  milk  tested  by  pairs  of  technicians  who 
were  accustomed  to  working  together:  74  percent  in  agreement  and  26  percent  differing 
by  1  point  or  more. 


25     - 


Transporting  Samples  From  the  Farm  to  the  Laboratory 

When  milk  samples  are  taken  from  farm  bulk  tanks  and  transported  to  the 
laboratory  for  testing,  the  motion  of  the  truck  may  churn  the  samples  and  cause  loss 
of  butterfat  in  the  fat  test.  Larger  particles  of  butter  in  churned  samples  cannot  be 
drawn  into  the  pipette^  hence,  the  milk  delivered  into  the  test  bottle  may  be  lower  in 
fat  than  the  milk  from  which  the  sample  was  drawn. 

Research  by  Ragsdale  and  others  showed  only  "slight  differences"  in  tests  on  1  7 
composite  samples  built  and  held  at  a  laboratory  and  17  duplicate  composite  samples 
which  were  built  at  the  farm  but  transported  every  other  day  to  and  from  the  farm  in 
the  refrigerated  sample  compartment  of  a  tank  truck  (18).  That  research,  however, 
does  not  throw  light  on  the  effect  ontest  results  of  transporting  fresh  samples  because 
the  composites  were  transported  rather  than  the  daily  samples  used  to  build  the 
composites. 

This  study  was  initiated  to  determine  whether  the  transportation  of  fresh  samples ( 
from  the  farm  to  the  laboratory  affected  the  level  of  test  or  the  variability  of  producers' 
tests,  either  in  testing  fresh  samples  or  composite  samples.  In  one  market,  part  of 
each  of  315  samples  takenfor  fresh  tests  was  heated  to  68°  F.  and  pipetted  into  test  bot- 
tles at  the  farm.  This  pipetted  part  of  the  sample  and  the  remainder  of  the  sample  were 
taken  to  the  market  administrator's  laboratory  in  the  sample  compartment  of  the  tank 
truck.  There,  milk  frpm  the  remainder  of  each  sample  was  pipetted  into  test  bottles, 
and  tests  were  made  on  the  duplicate  samples  pipetted  at  the  farm  and  laboratory.  In 
a  second  experiment  in  the  same  market,  158  fresh  samples  were  taken  from  farm  bulk 
tanks  and  pipetted  at  the  farm  by  the  market  administrator's  technicians.  A  second 
set  of  samples  was  collected  at  the  farms  by  the  driver  of  the  tank  truck,  brought  to 
the  plant,  and  pipetted  by  the  market  administrator's  technicians. 

In  addition  to  the  2  experiments  comparing  fresh  tests,  10  experiments  were  made 
with  164  pairs  of  composite  tests.  In  this  part  of  the  work  duplicate  composite  samples 
were  built  for  each  producer  whose  milk  was  tested.  One  of  these  was  built  at  the  farm, 
as  the  sample  was  taken,  and  kept  at  the  farm.  The  rest  of  the  daily  sample  was  taken 
to  the  laboratory  and  added  to  the  second  composites  which  were  held  at  the  laboratory. 
The  composite  held  at  the  farm  was  pipetted  there  before  being  taken  to  the  laboratory, 
where  all  of  the  samples  were  tested. 

In  7  of  the  12  experiments,  test  results  on  the  samples  pipetted  at  the  farm  differed 
significantly  from  results  on  samples  pipetted  at  the  laboratory  (table  12).  Damage  to 
samples  from  transportation  would  be  expected  to  cause  consistently  lower  tests  on 
the  samples  pipetted  at  the  laboratories.  Such  consistency  did  not  occur  in  the  7  experi- 
ments where  differences  were  significant.  In  4  of  the  7  experiments,  farm-pipetted 
samples  tested  significantly  higher  than  the  samples  pipetted  at  the  laboratory;  in  the 
other  3,  farm-pipetted  samples  were  below  the  laboratory  samples.  The  combined 
data  for  each  of  the  3  kinds  of  samples  used  in  the  experiments --fresh  samples,  7-day 
composites,  and  15-day  composites-- showed  no  significant  difference  in  test  results 
between  farm  pipetting  and  laboratory  pipetting  (table  12).  These  experiments  were 
performed  in  February,  March,  April,  and  November.  No  relationship  existed  between 
the  month  and  size  of  difference.  These  experiments  do  not  indicate  any  damage  to* 
samples    in  connection  with  transportation. 

Samples  in  these  experiments  were  carefully  handled,  since  drivers  were  aware 
that  duplicate  tests  were  being  made.  It  is  entirely  possible  that  individual  drivers, 
particularly  in  hot  weather,  may  damage  samples  by  improper  handling  on  the  truck. 
This    work   has  not  measured  the  effect  on  tests  of  improper  handling.     To  do  so  would 
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Table  12. — Number  of  samples  pipetted  at  the  farm  that  tested  higher  and  lower  in 
butterfat  than  samples  pipetted  at  the  laboratory- 


Type  of 
sample 


Farm  samples 
lower  in 
butterfat 
than  laboratory 
samples  by — 


2  points 
and  more 


1  point 


No 
difference 


Farm  samples 

higher  in 

butterfat 

than  laboratory 

samples  by — 


1  point 


2  points 
and  more 


Average 

difference, 

farm 

sample 

minus 
laboratory 

sample  l/ 


Number   Number 


Fresh  samples: 
315  pairs. . . 
158  pairs. . . 


Total,  473  pairs 


Composite  samples 
7-day: 

9  pairs 

9  pairs 

9  pairs 

9  pairs 


Total,  36  pairs 


15-day: 
2k  pairs 
2k  pairs 
20  pairs 
20  pairs 
20  pairs 
20  pairs 


Total,  128  pairs 

All  composites: 

Total,  16k   pairs 


6 
36 

k2 


Percent  Percent 
1       9 

Number  Number 
0        0 
0        3 
0        0 
0        3 


Percent  Percent 
0     17 

Number  Number 

0  0 

1  6 
0  3 
0  6 
0       3 

2  7 

3  25 

Percent  Percent 
2       20 


Number 
3 


Number 
31 


Percent  Percent 
2       19 


Number 

29^ 
77 

371 

Percent 
78 

Number 
3 
5 
k 
k 

16 

Percent 
kk 

Number 
19 
15 
Ik 

9 
15 

8 

80 

Percent 
62 

Number 

96 

Percent 

58 


Number 

15 
35 

50 

Percent 
11 

Number 
k 
1 
5 
2 

12 

Percent 
33 

Number 
5 
2 
3 
5 
1 

3 

19 

Percent 
15 

Number 
31 

Percent 
19 


Number   Fat  percent 


Percent 
1 

Number 
2 
0 
0 
0 


Percent 
6 

Number 
0 
0 
0 
0 
1 
0 


Percent 
1 

Number 
3 

Percent 
2 


+0.008** 

-  .001 

+  .005 


+  .089** 

-  .033* 

+  .056** 

-  .011 

+  .025 


+  .029** 

-  .019* 

.000 

-  .002 

-  .020 

-  .038* 

-  .007 


1/  Asterisks  indicate  the  level  of  significance  of  the  average  differences  (test  on 
farm  sample  minus  test  on  laboratory  sample).   In  repeated  trials,  equal  or  greater 
average  differences  could  be  expected  to  occur  by  chance  in  no  more  than:   one  percent 
of  the  trials  (**)  or  five  percent  of  the  trials  (*) . 
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require  controlled  experiments  in  which  duplicate  tests  were  made,  one  on  properly- 
handled  samples,  the  other  on  samples  which  had  been  deliberately  mishandled  in 
specific  ways, 

Including  Defective  Daily  Portions  in  Building  Composite  Samples 

One  of  the  problems  in  building  composite  samples  is  whether  portions  should 
be  added  from  daily  samples  which  have  been  churned  or  from  milk  which  is  partly 
frozen.  For  purposes  of  this  study  three  plants  made  a  record  of  defective  daily 
samples  and  included  portions  from  them  in  some  of  their  composite  samples.  For 
over  90  percent  of  the  composite  samples  none  of  the  daily  samples  had  been  defective. 
For  each  producer,  tests  on  composites  containing  1  or  more  defective  daily  portions 
were  compared  with  averages  of  fresh  tests  for  those  days  of  the  same  period  on 
which  the  samples  were  not  defective.  Tests  on  composites  with  no  defective  portions 
were  also  compared  with  averages  of  fresh  tests  for  the  period  (table  13), 

The  distribution  of  the  differences  for  both  series  of  10-day  and  of  15-day 
composites  are  shown  in  table  14.  For  each  type  of  composite,  10-day  and  15-day, 
the  two  distributions  of  comparisons  were  shown,  by  chi- square  tests,  to  vary  signi- 
ficantly at  the  1 -percent  level. 

Both  10-day  and  -15-day  composites  with  defective  portions  had  a  lower  pro- 
portion of  comparisons  agreeing  within  the  limits  -0.09  to  +0.09  percent  butterfat 
than  the  composites  with  no  defective  portions  (table  14).  The  average  differences 
for  the  two  series  of  10-day  composites  were  not  significantly  different  from  each 
other,  but  for  the  15-day  composites  they  were  significantly  different  at  the  5  percent 
level.  7/  For  both  10-day  and  15-day  composites,  the  average  difference  between 
composite  and  fresh  samples  was  greater  but  not  in  the  same  direction  (plus  for  10-day 
and  minus  for  15  =  day  composites)  for  composites  containing  some  defective  samples 
than  for  samples  with  no  defective  portions  (table  14). 

The  distributions  of  differences  for  the  composite  tests  with  defective  portions 
were  influenced  by  two  factors:  (1 )  Varying  numbers  of  defective  daily  samples  during 
compositing  periods,  and  (2)  smaller  numbers  of  fresh  tests  in  the  averages  used  in 
comparisons  with  composites  which  included  some  defective  portions. 

Of  the  431  10  =  day  composite  samples  which  included  defective  portions,  83 
percent  had  one  portion  defective,  12  percent  had  two,  and  5  percent  had  three.  The 
average  number  of  defective  portions  per  10  =  day  composite  sample,  and  consequently 
the  average  number  of  fresh  tests  omitted  from  the  comparable  average  of  fresh 
tests  for  the  average  of  5.56  bulk  tank  deliveries  during  a  10  =  day  period,  was  1.22. 
Of  the  109  15-day  composite  samples  which  included  some  defective  portions,  79 
percent  had  one  portion  defective,  12  percent  had  two,  and  9  percent  had  three. 
The  average  number  of  defective  portions  per  15-day  composite  sample,  representing 
also  the  average  number  of  fresh  tests  omitted  from  the  comparable  average  of  fresh 
tests    for   the    average    of   8.45    bulk   tank   deliveries  during  a  15°day  period,  was  1.30. 


7/  A  more  rigorous  test  of  the  effect  of  defective  samples  would  require  two 
series  of  samples,  one  including  defective  samples,  the  other  including  normal 
samples  for  all  days  of  the  testing  period.  Data  of  this  kind  would  be  difficult  to 
obtain.  It  might  be  feasible  to  obtain  defective  and  normal  samples  for  the  same 
lots  of  milk  under  laboratory  conditions,  where  conditions  could  be  controlled  to 
induce  churning  or  freezing  after  the  normal  sample  has  been  drawn. 
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Table  Ik. --Percentage  distribution  of  composite  milk  samples,  with  and  without  defec- 
tive portions  testing  higher  and  lower  in  butterfat  than  the  average  of  daily  fresh 
samples,  by  size  of  differences. 


.f f erence : 
>ite  test  minus 
of  fresh  tests  l/ 

10 -day 

composites 

15 -day 

composites 

Dj 
compos 
average 

No  portions 
defective 

:  Including  some 
defective 
portions 

No  portions 
defective 

Including  some 
defective 
portions 

Butterfat  percentage 

Percent 

Percent 

Percent 

Percent 

+.90  to 

+  .99 

0.0 

0.0 

0.1 

0.0 

+.30  to 

+  .39 

g/ 

0.5 

0.0 

0.0 

+.20  to 

+  .29 

0.1 

0.2 

0.0 

0.0 

+.10  to 

+  .19 

7.2 

10.9 

M 

6.k 

+.01  to 

+  .09 

39.1 

3+.6 

28.8 

18.+ 

0.00 

•  •••••  •  ■••••  ■  •  •  •  • 

10.6 

13.5 

5.9 

8.3 

-.01  to 

-.09 

35.0 

27.4 

1+6.1 

4-1.3 

-.10  to 

-.19 

7.6 

11.8 

13. 4 

22.9 

-.20  to 

-.29 

o.k 

0.9 

1.0 

2.7 

-.30  to 

-•39 

:     0.0 

0.2 

0.0 

0.0 

-.40  to 

-A9 

:     0.0 

0.0 

0.0 

0.0 

-.50  to 

-.59 

:al 

difference , 
?at  percentage  .  . . 

g/ 

0.0 

0.0 

0.0 

To1 

:   100.0 

100.0 

100.0 

100.0 

Average 
butter j 

.0005 

.0023 

-.0236 

-.0395 

1/  All  averages  of  fresh  tests  are  for  nondefective  samples  only 
2/  Less  than  0.05  percent. 
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Data  for  the  three  plants  participating  in  this  study  were  analyzed  to  determine 
whether,  and  how  much,  the  variance  of  the  average  of  fresh  tests  of  all  daily  samples 
might  differ  from  the  variance  of  the  averages  after  exclusion  of  defective  daily 
sample  So  Table  15  indicates  that  for  each  of  the  periods,  the  reduction  in  the  number 
of  fresh  tests  per  period  would  lead  to  an  increase  of  0„001  in  the  expected  variance 
of  the  average  of  fresh  tests,, 

The  variance  due  to  fewer  fresh  tests  could  be  expected  to  be  random,  with  about 
the  same  number  of  plus  and  minus  variations,  which  would  average  close  to  zero. 
For  this  reason  the  average  differences  in  butterfat  shown  in  table  14  probably  show 
very  little  effect  from  the  smaller  number  of  fresh  tests  in  the  averages  used  in 
comparisons  with  the  composites  built  with  some  defective  portions,.  The  estimated 
increase  of  0o001  in  the  variance  of  the  average  of  fresh  tests  could  result,  for  95 
comparisons  out  of  100,  in  an  average  being  up  to  0o002  higher  or  lower  than  the 
average  computed  from  the  normal  number  of  tests  during  the  period.  This  would  be 
expected  to  influence  the  distribution  of  differences,  but  since  0.002  is  small  compared 
with    the    differences    shown   in  table    13   this   factor  would  not  change  the  conclusions: 

1)  For  both  the  10-day  and  15-day  composites  the  distributions  of  differences 
from  averages  of  fresh  tests  are  significantly  different  for  composites  with 
no  defectives  and  composites  with  some  defective  portionsa 

2)  For  15-day  composites,  comparisons  with  averages  of  fresh  tests  show 
average  results  that  differ  for  composites  with  no  defectives  and  those  with 
some  defective  portions  by  an  amount  which  could  be  expected,  due  to  chance 
alone,  in  not  more  than  5  percent  of  repeated  trials. 
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